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Abstract 

o 

psj . We consider the problem of statistical inference for the effective dynamics of multiscale dif- 

fusion processes with (at least) two widely separated characteristic time scales. More precisely, 
^ . we seek to determine parameters in the effective equation describing the dynamics on the longer 

diffusive time scale, i.e. in a homogenization framework. We examine the case where both the 
drift and the diffusion coefficients in the effective dynamics are space-dependent and depend on 
multiple unknown parameters. It is known that classical estimators, such as Maximum Likeli- 
hood and Quadratic Variation of the Path Estimators, fail to obtain reasonable estimates for 
^ I ■ parameters in the effective dynamics when based on observations of the underlying multiscale 

, diffusion. We propose a novel algorithm for estimating both the drift and diffusion coefHcients 

in the effective dynamics based on a semi-parametric framework. We demonstrate by means of 
extensive numerical simulations of a number of selected examples that the algorithm performs 
J2 ■ well when applied to data from a multiscale diffusion. These examples also illustrate that the 

algorithm can be used effectively to obtain accurate and unbiased estimates. 
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1 Introduction 



■ 

X 



Problems with multiple temporal and/or spatial scales emerge naturally in a wide variety of fields 
in science and engineering. From biological phenomena |8] and atmosphere and ocean science |40| 
to molecular dynamics |21) , material science |15) , fluid and solid mechanics |241 [26] . Many of such 
systems are often subject to noise that is either due to thermal fluctuations |13j, randomness in the 
I environment (e.g. uncertainty in some parameters) |25 | I52 | coarse-graining of high-dimensional 
deterministic systems with random initial conditions |41| [56] , or stochastic parameterization of small 
scale effects |12| . 

Mathematically the influence of noise in a system can be described by a single or coupled, nonlin- 
ear stochastic differential equations (SDEs) with multiple scales. In many cases these equations are 
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high-dimensional but from a practical point of view only the evolution of some components of the so- 
lution is of main interest, since they act on a slower scale. It is therefore desirable to approximate the 
full system by an adequate simplified low-dimensional effective model (coarse-graining) that retains 
the essential dynamic characteristics of the full system. The effective equation is often amenable 
to analytical and numerical work. However, usually only the complete multiscale (fast/slow) sys- 
tem is directly observable but not the effective dynamics. Consequently, much research has been 
undertaken to find accurate approximations to the effective dynamics |19| . 

In many applications a wealth of data (i.e. observations of the fast/slow system) is often available 
and it is therefore worthwhile to use these data to determine the effective dynamics. However, 
extracting the effective dynamics directly from the available data is often not straightforward and 
it is thus important to adapt techniques from analysis, statistics, and numerical analysis to obtain 
appropriate coarse-grained models. 

This data-driven coarse graining methodology has been applied to relatively simple (stochastic) 
systems for which a low dimensional coarse-grained equation exists, using techniques such as ho- 
mogenization and averaging. For example, in |46| Brownian motion in a two-scale potential was 
studied. Therein, through both rigorous mathematical analysis and numerical simulations, it was 
shown that the estimation of drift and diffusion coefficients in the coarse-grained model is asymp- 
totically biased when using classical estimators. Furthermore, it was shown that subsampling the 
available data at an appropriate rate between the two characteristic time scales of the full system 
is necessary for an accurate estimation of both drift and diffusion coefficients in the coarse-grained 
model. More general fast/slow systems of SDKs for which a coarse-grained equation exists were 
studied in |45| where it was shown that the same issue of asymptotically biased estimators persists 
in the homogenization framework - that is, when considering an effective dynamics on the (longer) 
diffusive time scale - and that appropriately subsampled data reduce the bias. These techniques 
were then applied to the problem of estimating eddy diffusivities from noisy Lagrangian observa- 
tions in [9] where an improved algorithm that combines subsampling with appropriate averaging 
and variance reduction techniques was proposed and tested. Furthermore, inverse problems for 
multiscale partial differential equations - a problem closely related to that of parameter estimation 
- were studied in |44| . 

Although appropriate subsampling can potentially improve the accuracy of estimators in the con- 
text of multiscale diffusions, as it yields unbiased estimators, the question of an optimal subsampling 
rate (i.e. the rate for which the biased is removed) remains open; e.g. the studies in |46| I45| provide 
only existence results. Furthermore, the numerical experiments presented in the aforementioned 
works indicate that the optimal subsampling rate is not only problem dependent, but is also differ- 
ent for different parameters in the same model. Consequently, an optimal subsampling rate is in 
general unknown. The problem of identifying an optimal subsampling rate for Gaussian processes 
has been studied in [3l see also |55) for related work in the context of econometrics. 

Related problems have been studied in the context of numerical analysis for SDEs with multiple 
time scales. In particular, the heterogeneous multiscale method (HMM) \54\ [T2] is based on the 
idea of evolving the solution of the low-dimensional coarse-grained equation, when the coefficients 
in the coarse-grained equation are being evaluated "on the fly" by running short runs of the un- 
derlying fast dynamics. Similar ideas have been proposed in the framework of the "equation-free 
methodology" introduced by Kevrekidis and collaborators; see e.g. \53\ \3T\ \30\ I32j . Recently, the 
HMM methodology has been extended to approximate stochastic partial differential equations with 
multiple timescales As such, these techniques can be considered as a hybrid between numeri- 
cal analysis and statistical inference since the coefficients in the coarse grained SDE are estimated 
from data that are obtained from short runs of the full dynamics. We emphasize, however, that 
in the HMM the fast dynamics is assumed to be known, whereas in the aforementioned works on 
parameter estimation for multiscale diffusions, as well as in the present work no such assumption is 
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made. 

We also mention that the effect of the muhiscale structure on the evolution of the coarse-grained 
probability density using the Fokker-Planck equation (Kolmogorov's forward equation) was studied 
in |17) . In this study it was shown that when decreasing the spatial discretization in a finite difference 
approximation the error increases rapidly and that in order to avoid this, it is necessary to improve 
the accuracy of the estimators of the drift and diffusion coefficients. 

In many cases of interest the noise in the coarse-grained equation appears in a multiplicative 
way. One is thus confronted with the problem of estimating parameters in both the drift and the 
diffusion coefficients of an SDE of the form 

dxt = f{xt;i})dt+g{xt;e)dWt , 

with unknown parameters (t?, 0)"^ G 0, the set of all feasible parameters. The problem is further 
complicated by the fact that the parameters '& and 6 may not be independent as for example in 
the problem of Brownian motion in a two-scale potential, see Section 13.2.21 In the absence of 
multiscale effects in the data, or if one assumes that the optimal subsampling rate is known, a 
combination of the maximum likelihood estimator (MLE) and the quadratic variation of the path 
(QVP) is the most commonly used estimator in practice; see |5H [36l [37] for background material on 
classical estimators. The QVP (or a variant thereof) is used to estimate parameters in the diffusion 
coefficient and, based on these estimates, the MLE is used to obtain estimates for the parameters in 
the drift. As the MLE is based on an SDE with unit diffusion, one usually transforms the original 
SDE into an SDE with unit diffusion coefficient by applying Ito's formula to 

he{x)= r g{u-erUu, (1) 



for an arbitrary c in the state space of the process x and replacing 9 with the estimator obtained 
from the QVP for concreteness. For example, when considering 



g{x-,e) = y^TT^, 

which is the diffusion coefficient in the coarse-grained equation of the stochastic Burgers' equation 
(see Section [3. 2. 4p . then transformation ([1]) reads 



^ ^ ^ In ( V^x + V^i + O2X') - In (V^c + V^i + 026') 
hg[X) = 



where 6 = (^1,^2)"^ has to be replaced by a previously obtained estimator. 

Notice that this transformation can be singular when implementing it in practice, so that special 
care has to be taken when performing numerical simulations. In particular within the MLE frame- 
work where a (nonlinear) objective function needs to be maximized, this might cause problems. An 
alternative approach to estimate parameters is based on the MLE for the discretized approximation, 
e.g. obtained by the Euler-Maruyama scheme |33l ch. 9.1], of the time-continuous SDE. However, 
the non- vanishing (fixed) time-step size introduces an additional bias so that even for simple models 
this approach does not necessarily yield consistent estimators |38| . 



The main aim of the present study is to develop statistical inference techniques that enable us 
to estimate parameters in both drift and diffusion coefficients of a coarse-grained equation in the 
presence of an underlying (either stochastic or deterministic) multiscale structure in the fast/slow 
system. Furthermore, we wish to extend this approach in a semi-parametric framework, to situations 
where the drift and diffusion coefficients can be expanded in an appropriate (e.g. Taylor series) 
expansion. 
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Besides the lack of reliable statistical inference techniques for these problems, the motivation for 
this study originates also from recent results on the derivation of coarse-grained equations (also 
known as amplitude equations) for stochastic partial differential equations (SPDEs) with quadratic 
nonlinearities [6]. A typical example for such a SPDE is the stochastic Burgers' equation 



on [0, vr] equipped with appropriate boundary conditions. Therein Q denotes the covariance oper- 
ator, W space-time white noise, and < e ^ 1. To study solutions to ([2]) of 0{e) on time scales 
of 0(e~^), i.e. ensuring that we are in the regime described by amplitude equations, a diffusive 
rescaling is performed by defining v via ev(e^€) = u(i). Then v solves 



If the SPDE is equipped with homogeneous Dirichlet boundary conditions it can be shown that the 
dominant mode of the solution to (jS]) can be approximated by the solution to an one-dimensional 
SDE driven by classical Brownian motion Wt of the form 



Since this class of SPDEs arises in many different applications - from population biology |25j to 

fluid dynamics |50l H9] where many data are available - it is of major interest to obtain effective 
dynamics for the dominant modes of solutions to the SPDE by means of a data-driven coarse 
graining methodology. 

The statistical inference technique we propose here consists of two steps. First, we use the mar- 
tingale property of the stochastic integral to obtain an equation involving only the drift but not the 
diffusion coefficient of the SDE. Since the drift might depend on multiple (unknown) parameters, it 
is generally impossible to obtain the parameters uniquely from a single equation. The main element 
we employ to overcome this under-determined situation is the often disregarded initial condition. 
In fact, by varying the initial condition of the SDE one can define the estimator for the drift pa- 
rameters via the best approximation of a system of equations. For the second step, we rely on the 
estimators for the drift parameters and on the Ito isometry to obtain a relation for the unknown 
parameters in the diffusion. Using the same idea as in the first step, that is by varying the initial 
condition of the SDE, we can also define the estimators for parameters concerning the diffusion 
via the best approximation of a system of equations. In both steps the estimators rely on many 
short trajectories (i.e. on ensemble averages) in contrast to classical estimators that rely on a long 
trajectory (i.e. on ergodicity) of the underlying process, see e.g. |51j . 

The main practical advantages of the methodology proposed here can be summarized as follows: 

(a) Coarse-grained models of high-dimensional, possibly infinite-dimensional, problems that retain 
the essential dynamic characteristic of the full system are a powerful tool to study problems 
arising in science and engineering. The simplicity of such an effective model, typically an 
SDE, makes it attractive for mathematical and numerical scrutiny. When the coefficients of 
the SDE can be computed exactly, the effective SDE allows us to decipher rapidly some of 
the basic characteristics of the full system, e.g. by computing first-passage properties and 
properties alike as it was done for instance in [SUI I49| . However, in several problems it is 
not straightforward to obtain the coefficients of the coarse-grained SDE exactly due to the 
complexity of the underlying full system. From a theoretical standpoint several assumptions 
need to be made, e.g. the dynamics is dominated by a single eigenfunction/mode (often as- 
sociated with a symmetry in the system) and the higher-order modes decay sufficiently fast 




(2) 




(3) 
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(see the derivation of the "phase-diffusion" equation describing the transverse instabihty of 
propagating waves/fronts; e.g. in [351 I28|). to obtain computable coefhcients. Conversely, 
from a practical point of view obtaining approximations (estimators) of the coefhcients in the 
coarse-grained model, which can be used to study the basic characteristics of the full system, 
is very appealing. 

(b) A model for a physical or technological process might not be readily available, either because 
its derivation is cumbersome, or it is not straightforward to formulate it from first principles. 
But the underlying physics of the phenomena at hand and previous experience with similar 
systems suggests an SDE of the form adopted here, at least in certain regions of the parameter 
space. In a spirit similar to the equation-free approach, one can utilize available data and 
obtain a low-dimensional approximation to the process, which in turn can be used as a model 
for the process in regions of the parameter space consistent with the assumptions imposed 
from the outset. But often such models can be applicable in regions beyond those dictated by 
the assumptions. For instance, for a long-wave instability such as that observed on a surface of 
a film fiowing down an inclined plane |29| the growth rate curve of infinitesimal disturbances 
extends from the origin up to a cut-off wavenumber. The Landau-Stuart equation is then 
only applicable sufficiently close to the cut-off, i.e. it can be used to describe the transition 
when an unstable wave motion of given ("fundamental") wavenumber interacts with its first 
stable harmonics. On the other hand, when the growth rate curve has a "nose" near criticality 
consistent with a short-wave instability and so that the system equilibrates to a stationary 
norm solution in the nonlinear regime, such as with with Rayleigh-Benard convection |10) . the 
Landau-Stuart equation can be applicable even past the instability threshold. 

The rest of the paper is organized as follows. In Section [2] we present the precise derivation of the 
estimators for systems with and without multiscale structure. Based on these results, in Section [3] 
the general applicability and performance of the prosed methodology is investigated via different 
numerical examples. The extensive numerical study we undertake illustrates that the proposed 
technique enables us to estimate accurately parameters in multiscale diffusions. A summary of the 
results and future perspectives are offered in Section 21 

2 Estimators 

We present the precise derivation of the drift and diffusion estimators for systems without and sys- 
tems with multiscale effects present. First we outline the methodology for SDEs without multiscale 
structure and illustrate some properties of the estimators in this case, before presenting the set-up 
for multiscale diffusions. 

2.1 Derivation of Drift and Diffusion Estimators 

For the sake of simplicity we consider here only one dimensional real-valued processes (see Section 
l3.2.3l for an example of a multivariate process). Consider the scalar- valued Ito stochastic differential 
equation 

dxt = f{xt)dt + dWt , x{0) = xo = ^, (4) 

where W denotes standard one-dimensional Brownian motion. Both the drift coefficient / and 
the diffusion coefficient g are assumed to be sufficiently smooth such that the SDE provides a 
unique solution for any initial condition ^ E M and on any finite time interval; see e.g. |34| sec. 1]. 
Moreover, we assume that both drift / and diffusion g depend on unknown parameters and the task 
is to estimate the parameters in / and g from available data. In fact, here we focus on the case 
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when f{x) and g(x) are polynomials in x of degree maxjjj} and maxjjg}, respectively, where 
Jf,Jg C No denote index sets of finite cardinality p = \ Jf\ and q = \Jg\ respectively. The unknown 
coefficients of the polynomials are "d = £ W and 9 = {Oj)-^j G respectively, i.e. we 

consider 

f{x) = f{x;^) ■.= ^'djX^ and g{x) = g{x- 9) := ^ 6 jX^ . (5) 

Consequently, / and g are linear functions in •& and 9, respectively. These particular assumptions 
on the drift / and the diffusion g simplify the notation in what follows and they will lead to a linear 
system of equations for the estimators of the parameters. 

The starting point in the derivation of the estimators is based on the following identities 

n^t-i)= l\{f{xs))ds, (6a) 
JO 

^[{xt - C - ^ f{xs) dsf) = ^ E(5(x,)) ds , (6b) 

owing to the martingale property of the stochastic integral and the Ito isometry, respectively, hold- 
ing for any fixed initial condition ^. The next step is to incorporate the parameterization of the 
functions / and g into ([6]) , to identify the functional structure of the relation for the parameters 
and 9, respectively. We begin with substituting the parameterization of / into equation ()6ap . which 
will yield an estimator for the parameter that is present in the drift term alone. Based on this 
estimator it is possible to proceed similarly with equation (j6bp and eventually obtain an estimator 
for the parameter 9 present in the diffusion part. 

Substituting ansatz ([5]) for / into ([Sa]) yields 

nxt-o = yz^j fnxj)ds , (7) 

for a given initial condition ^. Therein E denotes the expectation with respect to the Wiener 
measure and with respect to processes starting at a fixed initial condition ^. To emphasize this 
dependency on the initial condition we will use the notation E = E^. Fix a time t > (the question 
how to chose the final time t will be addressed in Section [3]) and define 

6i : M 9 ^ ^ 6i(0 := lK^{xt - ^ G K 

G 



ai:R3 aiiO := ^^{xs^) ds 



With these definitions equation ([6a]) can be rewritten as 

ai(C)^^ = ^i(e). (8) 

The above equation is under-determined for p > 1. To derive a well-defined estimator for we 
consider a finite sequence of initial conditions (Ci)i<j<m ^ith m > p. Since ([8]) is valid for each 
initial condition, this approach yields a system of linear equations 

= hi , (9) 

with Ai := (ai(Ci)^)i<i<^ G ^""""^ and h := (fti(Ci)) i<i<^ G I^"- The linear system does not 
have a unique solution in general (if a solution exists at all). To overcome this shortcoming we 
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define the solution of the system of linear equations in ([9]), i.e. the estimator of the drift parameter, 
to be the best approximation 



:= arg min ||s||2 , 5i := {z G M''': H^iz — 61II2 — )• min}, 

respectively, 

^ := A+b, , (10) 

with Af being the Moore- Penrose pseudo- inverse [5]. We note that the estimation of parameters 
in the drift does not require knowledge of the diffusion coefficient. 

Assume now that we have already estimated the parameters in the drift / correctly. Then 
substituting the ansatz ([5]) of 5 into (|6bp yields 



E 



({xt-^- f f{xs;'d)ds)') = Te, [\{xj)ds. 



Recall again that the expectation is with respect to the Wiener measure and processes starting at 
^, hence E = E^. To cope with multiple parameters, we follow the same approach as for the drift 
parameters above. In fact, define here 



62: M 9^^62(6 ■.= E^(^{xt-C- 1^ f{xs;^)ds) 



and consider again a finite sequence of initial conditions (6)i<j<m- Then we also obtain a system 
of linear equations for the parameters 

A2^ = 62 , (11) 

with A2 := (a2(?i)'^)i<i<m 6 ^"''^'^ and 62 := (&2(Ci)) i<i<^ S W^. We define the estimator again 
via the best approximation 

e := A+b2 . (12) 

Since the estimation of diffusion parameters 9 is based on the estimators for the drift parameters, 
additional error sources might affect the estimator 9 in practice - see Section 12.31 for an example of 
this error propagation. 

In practice we are confronted with discrete observations instead of continuous ones, so that we 
need to approximate the (deterministic) integrals in ai(-), a2(-), and b2{-)- Assume that we have 
(n + 1) observations at equidistant times tk '■= kh for < A; < n and h := t/n. The goal is 
to approximate the integrals by means of these observations. Since the integrands depend on the 
path of the solution of an SDE, we cannot expect the integrands to be very smooth. For such 
"rough" functions the trapezoidal rule is more accurate than Simpson's rule |llj . Consequently, we 
approximate the various integrals via the composite trapezoidal rule 



/' 

Jo 



E{xs^)ds = V / E{xJ)ds ^ -(e(xo^) +E(xt^) +2 V: 
'0 fe=o-^*^ k=l 

where we used that to = and tn = t. 
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2.2 Description of the Algorithm: An Example 



To apply the actual parametric estimation procedure introduced in the previous section to a spe- 
cific problem, not only data need to be available but also a parameterization needs to be chosen. 
Consequently, the complete algorithm can be understood as consisting of two stages: 

1. Initialization: The time step h is given by the underlying time series of observations and 
is assumed to be constant (however, this assumption is not necessary, in fact, the procedure 
might be carried out in the exact same manner with a non-equidistant sampling rate) and 
the terminal time t = nh is fixed by choosing n appropriately (cf. Section [3|). Expectations 
are approximated by averages over N trajectories generated form independent Brownian 
motions. The crucial step is to fix a parameterization for both drift and diffusion coefficients. 
Lastly, the sequence of initial conditions (Ci)i<j<m ii^eds to be chosen appropriately. 

2. Two-step Estimation: Based on the initializations in the previous stage the estimators are 
well-defined. According to Section 12.11 the parameters 'd and 9 are estimated successively, 
first the parameters in the drift and then the parameters in diffusion coefficient. For both 
estimators, two steps need to be performed: 

a) Assembling the linear system equations ([9]) and (jlip respectively. 

b) Solving the arising systems via best approximation (jlOp and ()12p respectively. 

Since the estimation step depends on the considered parameterizations for drift and diffusion, we 
present a detailed pseudocode of the methodology in Algorithm [1] for the example using Jf = {1,3} 
and Jg = {0, 2}, i.e. p = 2 = q with 

f{x;^) = ^ix + ^3X^ and g{x; 9) = Oq + e2X^ . (14) 

This setting corresponds to the Landau-Stuart equation that will play a vital role in the numerical 
examples discussed in Section [3j The input arguments of Algorithm [1] are the time step size h and 
the data array X. The dimension of the array is a result of m different initial conditions each with 
N trajectories of (n + 1) observations. When we denote by x^^ the value at time t of the i-th 
trajectory started initially in then X corresponds to the collection of these trajectories at discrete 
(equidistant) times. For the example we consider here, we define approximations of the first three 
moments at time t via 

N N N 

i=l i=l 1=1 

Thus, the quantities defining the matrices and right-hand sides involved in the estimation step (cf. 
([9]) and ([TT]) . respectively) are approximated via 

ai(0 ^ ^{Qn{x.\^),Qn{x.\^)) , hiO^Xti^-^ (15a) 
a2(0'^«-(2n,Q„(x.|5))^, 62(6 « iV"' E " ^ " ^-(/^ ° ) ' (l^b) 

i=l 

with = f {■;'&) and Qn denoting the quadrature operator of the trapezoidal rule on [0,t] with n 
equally spaced [h = t/n) subintervals (cf. ()13p ) 

h 1—1 
Qn{u) := -{uo + ut + 2'^Ujh) . (16) 
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Algorithm 1 Algorithm for the estimation of the parameters in the drift and diffusion coefficients 
in m 



Require: h>0 and X G wnxNx{n+i) 
for i = 1 to m do 
for J = 1 to n do 

^ l^k=l N 

end for 

h ^ an- 
end for 

for i = 1 to m do 
for J = 1 to n do 

j_ (^i,fc,j+i)^ 

end for 

Ail nh 

A2^|((^.,l,l)'+7n + 2EJ=lSj) 

for j = 1 to A do 

6j ^ |(/(Ai,i,i;??) + /(Ai,,-„+i;^9) + 2EL2/(Aj,fc;^)) 
end for 

end for 

e^A+b 

T oT\T 



return {^'^,9'^) 
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It should be emphasized that equations such as x = A^b (e.g. as in hues 10 and 22 in Algorithm [T]) 

1 1 2 

are merely meant as a formal notation for x solving the least squares problem x = argmin^g^ \\s\\27 
S = {z: \\Az — 6II2 — )• min}, cf. equations ()10p and (|12p . rather than indicating that we first compute 
A'^ and then multiply it with b to obtain x, a step which clearly is computationally inefficient. 
In practice the method to compute the solution typically depends on the rank of A. For the 
numerical examples that we will present in Section [3] we ensured a full rank situation by considering 
a sufficiently large number of different initial conditions. The solution x can be computed with one 
of the several methodologies for solving least squares problems, e.g. the Cholesky factorization of the 
normal equation^ A^Ax = A'^b; for details on such methodologies we refer to standard textbooks 
on numerical linear algebra, e.g. |20| ch. 5]. 

2.3 Properties of the Estimator 

The proposed estimation procedure relies on two key ingredients. The first is that the methodology 
is based on the identities in ([6]). The second is that by considering a finite sequence of initial 
conditions we can cope with multiple parameters in drift and diffusion coefficient, respectively. In 
the sequel we demonstrate the influence of both components on the proposed estimation scheme 
with the help of some elementary, nonetheless illustrative, examples when no multiscale effects are 
present. A detailed and rigorous analysis of the proposed methodology for both multiscale and 
non-multiscale situations will be presented elsewhere. 

To illustrate the influence on the identities in ([6]) and to address some asymptotic properties we 
consider a simple Langevin equation with additive noise 

dxt = i^f{xt)dt + VedWt . 

The drift estimator proposed in this study - this parameterization is already a straightforward 
generalization of the one introduced above - relies on the relation 

E{ff{xs)ds)^ = E{xt-xo), (17) 
Jo 

with 'd being the true value. For a fixed final time t < 00, the estimator for continuous-time obser- 
vations - meaning that we approximate only the expectation by an average but do not approximate 
the integrals - based on a single (fixed) initial condition xq = ^ is given by 

^5 _ Eliift - ^0) _ , V^Ef=i /d _ ^ N\0, 9t/N) ^^g^ 
Eili /o ds YlZi /o fi^i) ds jf Eili /o ds 

Notice that we dropped the dependency on the initial condition, because only a single initial con- 
dition is considered here. Since we approximate only the expectations by finite averages, it is not 
surprising that the estimator for continuous-time observations (jlSp converges to the true value in 
agreement with the law of large numbers. The property that the variance of the error vanishes for 
— 7- 00 reflects the fact that the estimator relies on an identity, i.e. on a direct (deterministic) 
computation pT|) rather than on asymptotic time limits (i.e. on ergodicity). 

Recall that we have (n -|- 1) observations at times = to < ti < ■ ■ ■ < tn = t with t > fixed, in 
the case of discrete-time observations. The integral approximation introduces an additional error 



^This method for solving the least squares problem is simple to implement but might not be optimal and is usually 
only recommended for problems with large residuals. However, it is noteworthy that for the numerical examples 
presented in this study, the method performed well and no stability issues occurred (as can be demonstrated by 
monitoring the condition number; not shown here), and similar results were obtained with a QR factorization 
with column pivoting (not shown here). 
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that can be identified via 



t 

Us ds = Qn{u) + Cn , (19) 



with an appropriate constant € M that depends not only on n but also on u and t, and which 
vanishes in the limit as n — )• oo. Here Qn denotes the quadrature operator of the trapezoidal rule as 
defined in ()16p . The actual error of the trapezoidal rule depends on the regularity of the integrand. 
Here we rely only on the assumption lim„_>.oo = as a general scenario and do not discuss the 
rate of convergence. Then, similarly to the continuous-time case, the estimator (when neglecting 
sampling errors) can be written as 



ElM - ^o) _ Elii^Qnif o x^) + i^ci, + ^eil dwi] 



EllQnifoX^) EllQnifo 



V C7v(n) - jj Ylf^i lo f{x\) ds) jj YliLi /o fi^l) ds - C7v(n) ' 

with cj\^{n) = N~'^ Xl^^i being the average error constant. Hence, the integral approximation 
introduces an additional bias - second term in the bracket - that can be controlled by n. Notice that 
|cAr(n)| < maxi<j<jv \cn\ so that the additional bias vanishes as n — )■ 00 for every N. Consequently, 
from equation ()20p one infers that for a fixed final time t > 0, one should choose both n, ^ 1 to 
obtain an accurate estimate. 

To estimate the diffusion coefficient the proposed scheme relies on the relation 

t 



:E(^{xt - xo - ^ f{xs)dsf) 



that is valid for all times t > and where we replace i? by its estimator i!) for concreteness. Conse- 
quently, the estimator for continuous-time observation using a single (fixed) initial condition and a 
fixed final time t > reads 

^ = 7iv^("^*"'^°"^/ ^(^'«)^") =m^{^'^~^^ f{xi)ds + Ve dwi 

i=l "^0 i=l "^0 Jo 



^^x^+^^^E(/V(x:)d.)\^^^y:w^/ / fixDds. 



N"^" tN ^\Jo tN ^ 







Since the estimator for the diffusion coefficient depends on the estimated drift parameter, an addi- 
tional error is introduced (last two terms), as expected. To illustrate the asymptotic properties of 
the estimator we, however, assume that the error {J} — i?) is negligible. Consequently, we find that 

where x^r denotes the Chi-squared distribution with N degrees of freedom. Recall that j^^^ ~ 
M{\,2/N) for N sufficiently large, as a consequence of the central limit theorem. 

For the modification of discrete-time observations the integrals are again approximated via the 
trapezoidal rule. Based on the same {n + 1) observations (neglecting sampling errors) at = to < 
ti < ■ ■ ■ < tn = t and t > fixed we find 



tj:jY.V''t-^o-^Qn{fox'.)) = Y.[[^-^) f{x\)ds + V~e dwi + ^ci 

i=l 1=1 •'^ •'^ 
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with being the error representation of the trapezoidal rule, cf. (fT9|) . In contrast to the continuous- 
time situation, the term ??c^ reflects the additional error due to the integral approximation that can 
be controlled by n. Since this additional term is the only difference, expanding the square yields a 
similar result as in the continuous-time situation. If we assume again that the error from the drift 



with CN{n) = X^i^i (Cn) • Since < CAr(n) for every A^, the second term in equation ()2ip 
can only be controlled by n. Consequently, for a fixed final time t, choosing n,N S> 1 it is 
necessary to obtain an accurate estimate of the true parameter. It is noteworthy that the same 
steps may be carried out for an arbitrary diffusion function, but obviously we cannot directly infer 
the distributions of terms involving the stochastic integral. 

To deal with multiple parameters in drift and/or diffusion coefficients the proposed estimation 
scheme relies on considering a finite sequence of initial conditions and defining the estimator via the 
best approximation. To illustrate the effect of this second component of the estimation procedure, 
consider the SDE 



Provided all quantities are well defined, the estimator of ??, via a best approximation using a sequence 
of initial conditions {£,i)i<i<m, is given by 



where ai(-) and &i(-) are as in ([8]) associated with the drift function /. On the other hand, the 
quantity ^i(ft)/'^i('^i) yields a local estimator for each initial condition because the considered 
problem has only one parameter to be determined. Thus the best approximation corresponds here to 
the weighted arithmetic mean of these local estimators with weights a(^i)^. Consequently, increasing 
m includes an additional stabilization effect into the estimation scheme. From this point of view, 
the best approximation resolves naturally the problem of combining local estimates to a global 
estimator that arises in different estimation procedures as well, where piecewise local strategies are 
utilized to improve the estimates; see for example j^. In this study a heuristic estimation strategy 
for non-constant diffusions in the MLE framework is proposed by extracting local information from 
a large time-series to estimate coefficients locally and combine these local estimators to global 
estimators. Since this strategy is based on the MLE, the results in the above study also highlight 
the subsampling issue when applied to multiscale diffusions. 

It should be noted that although the proposed methodology computes moments of the solution 
of an SDE, there is no direct link to the generalized method of moments (GMM) |23| . The GMM 
for parametric estimation for SDEs is more closely related to the MLE instead, as both estimation 
schemes rely on ergodicity of the corresponding process by using long time-series. In fact, both 
estimators even coincide for many cases |22| ch. 14.4]. Since the MLE is biased for multiscale 
diffusions, it can be expected that the subsampling issue also arises for the GMM when applied to 
data obtained form a system with multiscale structure. 



parameter (?9 — is negligible, then we find 




(21) 



i=l 



dxt = ^f{xt) dt + ^g{xt)dWt. 



E:^i«ite)^ite) 
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2.4 Estimators for Multiscale Diffusions 



In the context of diffusion processes with two widely separated time scales we consider the following 
set-up: A fast/slow system of SDEs 

dxt= (^^fi{xt,yt) + Mxt,yt)^ dt + a{xt,yt) dUt , (22a) 
dyt = (\g2{xt,yt) + -gi{xt,yt) +go{xt,yt)) dt + -^{xt,yt)dVt , (22b) 

equipped with appropriate initial conditions. In (|22p [/, V denote Brownian motions of appropriate 
dimensions and < e <C 1 denotes a small parameter. For the dimension of the fast /slow system 
we assume that y. T ^ and (for simplicity) x: T i— )■ M, where T = [0,t] denotes a finite time 
interval of interest. Furthermore, we assume that drift and diffusion functions in (|22ap and (|22b|) 
respectively, are such that there exists a well-defined (i.e. the SDE provides a unique weak solution 
on any finite time interval and for any initial condition) coarse-grained SDE 

dXt = f{Xt) dt + ^g{Xt) dWt , (23) 

in the limit of e — t- 0; see e.g. |471 ch. 11 and 18] and references therein for technical details. That is, 
the slow process x is approximated by the solution of (j23p for e <^1. Even in cases where both the 
effective drift / and the effective diffusion function g are known, the actual computation of these 
expressions might be difficult or even impossible. Hence, our goal is to estimate both effective drift 
coefficient / and effective diffusion coefficient g in (|23p from available data (observations) of the 
fast/slow system ()22p (more precisely, only of its slow component). To this end we assume the same 
parameterizations of the effective drift and diffusion functions as introduced in Section [2.11 

f{x) = f{x;'d) := ^ -djx^ and g{x) = g{x-e) := ^ OjX^ , 

where we recall that Jf,Jg C No denote index sets with p = \Jf \ and q = \Jg\. Our goal then 
is: Given observations of the slow component (|22ap . is it possible to estimate the parameters 

= G and 9 = {6j)j^j G M''? Under the assumption that the (ergodic) fast process 

(|22bp is stationary, the algorithm described in Section 12.11 applies straightforwardly also for this 
problem, given the final time t of observation length is appropriately chosen; cf. Section 13.21 That 
is, using multiple initial conditions for the slow process ()22bp to deal with multiple parameters in 
drift and diffusion, while the fast process is sampled from its invariant measure that is assumed to 
be known, either analytically or numerically. 

The main motivation to consider the same algorithm also in the presence of multiscale effects 
originates from the fact that both the slow component of the full fast/slow system and the effective 
dynamics have probability laws that are approximately the same, provided e <C 1. Consequently, 
also expectations with respect to these laws are approximately equal. Since the proposed method- 
ology is based on expectations, cf. equation ([6]), we believe that this approach yields asymptotically 
unbiased estimators. A rigorous analysis of the proposed methodology to verify this heuristic argu- 
ment is currently work in progress and will be reported elsewhere. 

We conclude this section with a remark on a recently proposed estimator for constant diffusion 
coefficients |16) that can also be derived using the approach we introduce here. To this end, assume 
that the coarse-grained equation takes the form: 

dxt = f{xt)dt + ^dWt , 

where we assume that the effective drift / is known and we wish to estimate the diffusion coefficient 
6 from available data of a fast/slow system. When considering a single (fixed) initial condition and 
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following the approach introduced in Section 12. H the resulting estimator reads 

i=l j=0 ''^^ 

where the integrals are being approximated by a quadrature rule and t is chosen appropriately (in 
fact, we have approximated the Lebesgue integral via the trapezoidal rule). If one, however, uses a 
rectangular-method with the left corner node instead, this estimator coincides with the estimator 
proposed in |16| for estimating the effective diffusion coefficient based on observations of the slow 
component of a fast/slow system. We emphasize, that a crucial assumption on the estimator in the 
aforementioned work is that the effective drift is known a priori. This assumption is very restrictive 
and makes the estimator unfeasible for most practical applications. A further limitation of the 
estimator in |16) is that it applies only in situations where the noise in the coarse-grained equation 
is additive (constant diffusion coefficient). Conversely, the methodology proposed here aims to 
estimate multiple parameters in both the drift and the diffusion coefficients. 



3 Numerical Experiments 

We now present numerical experiments of parameter estimations for diffusion processes to illustrate 
the behavior of the estimation scheme developed in Section [2j 



3.1 Parameter Estimation for single-scale SDEs 

Let us first present numerical results for parameter estimation when no multiscale effects are present. 
We investigate two different SDEs. In Section 13.1.11 we consider the SDE corresponding to the 
Ornstein-Uhlenbeck process and in Section [3?L2] we examine the stochastic Landau-Stuart equation. 
The purpose of these examples is twofold. On the one hand they are used to illustrate that the 
proposed methodology can be employed successfully in practice to estimate unknown parameters 
and on the other they help us understand the influence of the parameters n, /i, N ^ and m on the 
algorithmic estimation procedure. The time series were obtained by solving the corresponding SDEs 
via the Euler-Maruyama scheme. In all numerical examples reported in this section, we used a time 
step oih = 10-3. 



3.1.1 Ornstein-Uhlenbeck Process 

Consider the following SDE 

dxt = -Axt dt + y/adWt , xq = ^ , (24) 

with the unique solution being the Ornstein-Uhlenbeck process starting at The estimation pro- 
cedure is applied to data from the SDE with true parameters (A, a) = (0.5, 0.5) using a variety 
of different values for the parameters of the algorithm. Figure [1] depicts the relative errors of the 
estimated values as functions of the number of initial conditions m for different combinations of n 



(recall that the final time is t = nh) and N. For the estimated drift parameter A (Figure 1(a)) 
there is a discrepancy among different combinations of n and A'^ for small values of m and the 
relative errors vary from approximately 0.08 to 6. Increasing m decreases the relative errors, with 
fluctuations due to the discretization, as expected; cf. Section 12.31 Moreover, it is apparent that 
the larger n and N , the smaller the relative error. In contrast to the drift parameter, the relative 



errors of the diffusion parameter a (Figure 1(b) ) are already small (relative error smaller than 0.05) 
even for small values of m. This is due to the constant diffusion coefficient of the SDE. Increasing 
m generally decreases the relative error further, again, with fluctuations due to the discretization. 
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(a) relative error of A 



(b) relative error of a 



Figure 1: Relative error of the estimated parameters in (|24p as functions of the number of initial 
conditions m in a log-log scale. The final time of the considered time series is t = nh with 
h = 0.001 and the true parameters are (^,0") = (0.5,0.5). Furthermore, N denotes the 
number of independent Brownian paths. 



We note that although an error propagates from the drift estimation to the diffusion estimation (cf. 
Section [2]), it appears to be negligible in this example. Based on theses results, it seems plausible to 
tune the algorithm-defining parameters in a way such that the estimators provide a given relative 
accuracy while at the same time the computational cost is minimized. Obviously the question of 
optimized algorithm-defining parameters is of high importance in practical applications. However, 
this is not a straightforward issue to address as not only the discretization related parameters n 
and iV influence the accuracy of the algorithm in practice, but also the number of initial conditions 
m as well as their locations; we will investigate these and related issues in a future study, see also 
the discussion in Section |4l 



3.1.2 Landau-Stuart Equation 

Consider the stochastic Landau-Stuart equation |35l ch. 2.2], where both additive and multiplicative 
noise are present 



dxt = {Axt - Bxf) dt + ^(Ta + abxl dWt , xq = ^ . (25) 

This SDE can be obtained from a wide class of spatially extended systems, e.g. the noisy Kuramoto- 
Sivashinsky equation |50| I49| by assuming near-critical conditions, i.e. being sufficiently close to the 
primary bifurcation, and employing the homogenization theory developed in |6]. In this case we need 
to estimate a total number of four parameters, two in the drift and two in the diffusion coefficient. 

We performed various numerical experiments with different choices of parameters. Figure [2] illus- 
trates the relative error of the estimated parameters as functions of the number of initial conditions 
for different combinations of n and N when the true parameters are {A, B, aa, CTb) = (3, 2, 1.5, 1.3). 
We find qualitatively the same behavior as in the previous section for the Ornstein-Uhlenbeck pro- 
cess: all three parameters n, N , and m affect the accuracy of the estimators. Although the Figures 



2(a) - (d) show a decreasing trend of the relative errors when increasing m, fluctuations are still 
present. These fluctuations are of different magnitude for different parameters and are reduced 
by increasing both n and N . One also observes that increasing m improves the accuracy of the 
estimators but only up to a certain level, i.e. the corresponding curves approach nearly constant 
values for large m, that are due to the approximation of the original system matrix, for instance 
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Figure 2: Relative error of the estimated parameters in (|25p as functions of the number of initial 
conditions m using a log-log scale. The final time of the considered time series is t = nh 
with h = 0.001 and the true parameters are (A, o'b) = (3, 2, 1.5, 1.3). denotes the 

number of independent Brownian paths. 
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in ([9]), by a matrix based on observations and the aforementioned discretizations. Ahhough it is 
apparent that estimating ah parameters in the Landau-Stuart model is more delicate than for the 
Ornstein-Uhlenbeck process in Section I3.1.H it nonetheless seems possible to determine optimal 
algorithm-defining parameters such that the computational cost is minimized given a certain error 
tolerance for the estimators for this model also. 



3.2 Parameter Estimation for Fast-Slow Systems 

Of particular interest is the behavior of the estimator when applied to systems with two different 
time scales. We examine the properties of the estimation scheme for stochastic multiscale diffusions 
(Sections I3.2.1f[3.2.3p . truncated systems of time rescaled stochastic partial differential equations 
(Section I3.2.4p . and deterministic systems that exhibit temporal chaos that can be approximated 
by an appropriate SDE (Section I3.2.5p . As the precise dependency of the estimation procedure on 
the control parameters n, m, N, h is still an open question, the main purpose of this section is to 
illustrate the general applicability of the proposed estimation procedure for multiscale diffusions. If 
not stated otherwise, the generated time series were obtained by solving the corresponding multiscale 
SDEs via the Euler-Maruyama scheme using a time step h = W~^. Furthermore, the expectation 
is approximated by an average using N = 5000 independent Brownian paths and m = 150 different 
(equally-spaced) initial conditions are used. We emphasize once again that this particular choice of 
algorithm-defining parameters might be far from optimal in the sense of computational complexity. 
But since our main goal is to demonstrate the applicability of the proposed scheme to multiscale 
diffusions, these algorithm-defining parameters will yield reliable estimators. 



3.2.1 Fast Ornstein-Uhlenbeck Noise 

When the fast process is an Ornstein-Uhlenbeck process it is rather straightforward to determine the 
precise form of the effective equation associated to the fast/slow system, because this task reduces 
to computing Gaussian integrals. Consider for example 

dxt = (^^a{xt)yt + h{xt,yt) - a'{xt)a{xt)j dt , (26a) 
1 a/2 

dyt = -^ytdt + — dVt , (26b) 
with Vt being a standard Brownian motion, then the effective dynamics is given by 



dXt = h{Xt)dt + .j2a{XtY dWt , (27) 

where h[x) denotes the average of h{x, •) with respect to the invariant measure of the fast process 
(Ornstein-Uhlenbeck process). We note that we have subtracted the Stratonovich correction from 
the drift in ()26ap . so that the noise in ()27p can be interpreted in the Ito senself] In the sequel we 
consider two different choices of the pair h{-),a{-). As a first example let 

/i(x, y) = h{x) = Ax and (t{x) = t/o" , (28) 

then the amplitude equation is precisely the Ornstein-Uhlenbeck process (cf. Section [3. l.ip but here 
in the context of multiscale diffusions where classical estimators fail. To illustrate this failure and 
motivate the necessity of an appropriate sampling rate, the quadratic variation of the path (QVP) 



^The noise entering (|26a|) . i.e. the process H26b|) . is a smoothed approximation to white noise, so that the noise 
in the limiting equation has to be interpreted in the Stratonovich sense according to the Wong-Zakai theorem. 
Correcting the drift is not essential for the applicability of our methodology, it was done so that the limiting 
equation (|27[l is somewhat simpler. 



17 



0.6 
a 
0.4 



-0.6 




2 3 4 5 
&h (h = lO-^*) 

(a) Classical estimators 




0.5 1 

nh (h = 10-3) 

(b) Novel procedure 



Figure 3: Performance of classical estimators and our procedure for both drift and diffusion coef- 
ficients in (|28p . In Figure 



the MLE and QVP as functions of the subsampling rate 
5h {5 = 1 corresponds to no subsampling) are shown. Figure |(b)| shows the result of 
our procedure as a function of the final time t = nh. In both cases the sampling rate of 
the considered time series is /i = 0.001 and the true parameters are [A, a) = (—0.5,0.5). 
Dashed lines denote the true values. 



estimator for the effective diffusion constant and the maximum likelihood estimator (MLE) for the 
effective drift parameter are applied to a time series on [0, 5000] with initial condition xq = 0.5 
generated by the associated fast/slow system (j26p with true parameters {A^a) = (—0.5,0.5) and 



e = 0.1. Figure 3(a) depicts the performance of both the QVP and the MLE as functions of the 
subsampling rate 5h. The parameter 5 indicates here that only every (5-th observation is used to 
estimate the parameters in the drift and diffusion coefficients, hence 5h denotes the time period 
between two consecutive observations. Starting from the case without subsampling {5=1, that is 
5h = 10~3) and increasing 5, both estimators approach the true value. However, after an optimal 
subsampling rate, for which the estimator is as close to the true value as possible (here approximately 
5h = 0.25, that is 5 = 250), both estimators deviate monotonically from the target value. We note 
that at the optimal subsampling rate the relative error is approximately 10%. Figure 3(a) seems 
to suggest that the optimal subsampling rate is given by the local extremum of the estimator as 
function of the subsampling. However, this behavior is not true in general, see for instance the 
numerical examples in [461116) that reveal different behaviors. Recall, that the optimal subsampling 
rate is in general unknown. 

The situation is different when the effective parameters are estimated via the method introduced 
here. Observations are generated by the associated fast/slow system ()26p with true parameters 
{A, a) = (—0.5,0.5) and e = 0.1. The performance of the estimator as function of the final time 
t = nh for both A and a is plotted in Figure [3(b) [ and is compared directly to the results obtained by 
the classical estimators discussed before. For small values of t = nh one observes that the estimated 
value of the drift parameter A fluctuates around the true value and stabilizes for larger times with 
minor fluctuations around the true value. We note that the estimator obtained by the proposed 
scheme significantly outperforms the MLE in terms of accuracyH For the estimated diffusion coeffi- 



^At the optimal subsampling rate the relative error for the MLE is approximately 10%, whereas the relative error for 
the drift parameter obtained by the novel scheme is less than 1% for t = nh > 0.25. Admittedly, this comparison 
is not completely fair, because the novel estimation scheme in its current form relies on more data than the MLE, 
but we believe that these numbers nonetheless illustrate the potential of the methodology we propose. 
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(a) Classical Setting 



(b) Multiscale Setting 



N da CTb N da db 



1 


0.948301 


0.251906 


1 


0.056463 


0.000000 


10 


0.871561 


0.390639 


10 


0.039087 


0.028682 


100 


0.817719 


0.480469 


100 


0.040613 


0.026214 


1000 


0.806024 


0.500243 


1000 


0.040363 


0.026727 



Table 1: QVP estimators when fitting the Landau-Stuart SDE (|30p to observed data. For the 
results outlined in table (a) the observed data were obtained from ()30p . i.e. the classical 
setting, whereas for the results in table |(b)| observations of the slow component ()29ap of 
the fast/slow system (|29p were used, i.e. the multiscale setting. In both cases the true 
parameters are ((Ta,o"b) = (0.81,0.49) and e = 0.1 was used in the multiscale setting. The 
parameter N indicates the number of independent Brownian paths that have been used to 
compute an average. 



cient a one finds that the novel scheme proposed here and the QVP estimator yield similar results 
for small values of t = n/i, indicating that increasing t = nh here has the same beneficial effect 
as subsampling does for the QVP estimator. But unlike the QVP, the estimator proposed here 
approaches the correct value further and closely fluctuates around it when increasing nh, without 
any deviations as when using QVP and subsampling. This is a typical behavior for the estimator 
when applied to multiscale diffusions as we will see in the forthcoming examples. Consequently, one 
finds that, unlike classical estimators, once the final time t = nh is larger than a critical value, the 
estimator fluctuates closely around the true value. 



Consider as a second example h{x, y) = h{x) = Ax — Bx^ and (t{x) = \/ (Ta + crbX^, so that the 
fast/slow system ([26]) reads 

dxt = (^^ytV^^a + crbXt'^ + {A - ab)xt - Bxt^^ dt , (29a) 

dyt = -\ytdt + — dVt , (29b) 

with the effective dynamics in (|27p given by the Landau Stuart equation (see Section [3. 1.2p 

dXt = {AXt -BXt^)dt + ^2{aa + abXt^) dWt . (30) 

A natural extension of the QVP estimator to diffusion coefficients that depend on multiple param- 
eters is obtained by considering the standard QVP relation for different time increments. Provided 
that one considers a sufficient number of increments, i.e. a sufficient number of estimating equations, 
one can define the QVP estimator by solving the arising system. To illustrate that this approach 
can deal with multiple parameters in the diffusion coefficient when no multiscale effects are present 
in the data, we first use the QVP to estimate 0"a and cJb in (|30p based on data that are also obtained 
from (|30p . This (single-scale) situation corresponds to the classical case of parametric estimation 
and the QVP can indeed be used to obtain accurate estimators da and 0"^, as it is illustrated in table 
m^a)[ Therein the obtained estimators da and db are presented for different values of iV, where N 
indicates the number of independent Brownian paths that have been used to compute an average 
to improve the accuracy. The Landau-Stuart equation (|30p was solved numerically on T = [0, 1000] 
starting at X{Q) = 0.5 with true parameters [A, B, aa,crb) = (1, 2, 0.81, 0.49). Apparently, the QVP 
yields very accurate estimates in this setting. However, things change when the QVP is adopted in 
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Figure 4: Performance of the novel estimators A, B, da, and dh for the Landau-Stuart equation (j30p 
as functions of the final time t = nh with h = 0.001. The true effective parameters are 
{A,B,aa,ab) = (1,2,0.81,0.49). 



the presence of multiple time scales. In this multiscale setting we wish to estimate the parameters 
C7a and o"b in the effective dynamics ([30|) from observations of the slow component (j29ap of the 
fast/slow system (|29p . The same true parameters and configuration (i.e. same number and length 
of time increments) of the QVP as in the classical example without multiscale effects were used, 
since the QVP performed well therein. Table \ ^h)\ displays the obtained estimators cJa and ai, for 
different values of N when the observations are obtained from (|29p with scale separation e = 0.1. 
Increasing yields QVP estimators with minor fluctuations but both estimators are strongly bi- 
ased, as expected. Hence, an appropriate subsampling of the data would be required to remove the 
bias, but, once again the optimal subsampling rate is not known a priori and might, as in the case 
of the MLE, even be different for different parameters. 

Conversely, we will use this example to illustrate that the parameters in multiscale diffusions can 
be estimated accurately using the proposed scheme, even though the amplitude equation provides a 
far more involved structure than the one of the previous example in addition to the multiscale struc- 
ture of the problem. Figure U] depicts the performance of the estimation procedure based on observa- 
tions generated by the fast/slow system (|29p with true parameters {A,B,aa,crb) = (1,2,0.81,0.49) 
and e = 0.1 as a function of the final time t = nh. The true values are indicated by dashed lines. 
The behavior of the estimators is qualitatively similar with that in the previous example. By in- 
creasing t = nh, the estimators approach the true values, respectively and fluctuate closely around 
them after a critical final time. Consequently, all parameters can be estimated accurately. 

3.2.2 Brownian motion in a two-scale Potential: A Quadratic Potential in one dimension 

Here we study the example that was originally used in |46| to illustrate the failure of classical 
estimation schemes in the context of multiscale diffusions for the first time. More precisely we 
consider the first-order Langevin equation 

dxt = -VVa(^xt, ^) dt + V2^dUt , (31) 

which is a simple model to describe the movement of a Brownian particle in a two-scale potential 
Va subject to thermal noise - Ut being a standard Brownian motion. Here we consider the one- 
dimensional problem (a two-dimensional example is treated in see Section [3. 2. 2p and further assume 
that the two-scale potential is given by a large scale as well as a fluctuating part: Va{x,y) = 
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Figure 5: Performance of the estimators A, S in (|32p as functions of the final time t = nh with 
h = 0.001. 



aV{x) +p{y). Based on these assumptions we can rewrite the Langevin equation as 

dxt = -(ay'(xi) + V(y)) + V2^dUt . 

When the fluctuating part p is sufficiently smooth and periodic with period L, the effective dynamics 
is given by 

dXt = -AV'{Xt) dt + ^/^dWt , (32) 

where the effective coefficients are given hy A = al?l{Z+Z.) and S = gL^{Z^Z_), where Z± = 
Jq e'^'P^y)!'^ dy, see |46| for details. Here we consider V{x) = x^/2 and p{y) = cos(y), so that (|32p is 
the SDE of an Ornstein-Uhlenbeck process with 

A = —— -TTT and " 



l\2 



where Io{z) denotes the modified Bessel function of first kind, cf. [21 ch. 9.6]. We note that both 
the effective drift and the effective diffusion depend on the diffusion cr of the original fast/slow 
syste nS. Figure [5] shows the performance of the estimation scheme when applied to observations 
of the fast/slow system with [a, a) = (1,0.5) and e = 0.1. As for the examples in the previous 
section both estimators A and S are biased for small final times t = nh. Using longer time series, 
i.e. increasing nh, reduces this bias and both estimators approach the true values (dashed lines) 
respectively. 



3.2.3 Brownian motion in a two-scale potential: A quadratic potential in two dimensions 

To illustrate that the proposed methodology can readily be extended to multivariate processes, we 
consider here a generalization of (j3ip in two dimensions 

dxt = -Vvixt, —■,M]dt + V2^dUt , 



*We remark that in the numerical examples presented in [161 sec. 4.3] the drift coefficient in the homogenized 
equation is assumed to be known when estimating the diffusion coefficient, even though it depends explicitly on 
the unknown a. 
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Figure 6: Performance of the estimators A, S in ()33p as functions of the final time t = nh with 
h = 0.001. 



where y(-,-;M) denotes again a two-scale potential with M being a set of parameters controlling 
the drift and Ut denotes a standard two-dimensional Brownian motion. As with the one-dimensional 
case, we assume that the two-scale potential V{-, •; M) is given by a large scale as well as a fluctuating 
part, with the fluctuating part being separable: V{x,y \ M) = V{x; M)-\-pi{u)+p2{v) , with x,y G 
and y = {u,v)'^. Hence, the original system reads 

dxt = -{vV{xt;M) + i dt + V2^dUt, 

with xt = {xj, xf)'^ G M^. We take the large scale part to be a quadratic potential 

V{x; M) = ^x'^Mx , 

with M being symmetric and positive definite, so that the effective dynamics is given by 

dXt = -KMXt dt + V2cjK dWt , (33) 

for Xt G with analytic expressions for K = diag(A:i, ^2); see |46| . With pi{u) = cos(n) and 
P2{v) = cos(r;)/2 we find 

1 1 

ki = —— — and h 



h{l/aY ^ /o(l/(2a))2 ' 

where Iq{z) denotes again the modified Bessel function of the first kind. 

Since both identities in ([6]) have pendants for multivariate processes, the methodology introduced 
here can be readily applied to estimate both effective drift matrix A := KM and the effective 
diffusion matrix S := 2aK in (j33p . The only difference is that the system of equations corresponding 
to ([9]) and (jlip . respectively, is a matrix equation in this case, and similar techniques to obtain 
the best approximation (both formally and numerically) can be employed |48) . Figure [6] depicts 
the performance of the estimation scheme when applied to observations of the fast/slow system 



with M = (2 2), a = 3/2, and e = 0.1. Figure |6(a)| shows the estimated values of the four 
effective drift coefficients: As for the one-dimensional examples, the estimators also show here 
an approaching behavior towards the target values when increasing t = nh, yielding an accurate 
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estimate oi A = KM for t = nh sufficiently large. The same behavior when increasing t = nh is 
also observed for the estimated (diagonal) effective diffusion coefficient S = 2aK, as it is shown in 
Figure [6(b) [ We note that although the curves in Figure [6(b) | show a minor gap for larger t = nh, 
the relative error is less than 2% in both cases for t > 0.75. 



3.2.4 Truncated Burgers' Equation 

We consider here an appropriately rescaled variant of the stochastic Burgers' equation 



\{dl + l)ut + ^dxuf + i/ut] dt + -( 



on[0, tt] equipped with homogeneous Dirichlet boundary conditions. Under technical assumptions 
on the covariance operator Q of the space-time white noise W, one can show (see [T] and references 
therein for details) that the coefficients of the three-term truncated representation of the solution 
have to solve the following multiscale SDE 

dxt = {vxt - ^i^tvl + ytVt)) dt , 



dy] = {vyl - ^yl - ^{2xty^ - xt')) dt + ^ dV,^ 



dyt = {vy'i - -^yt + ^^tvl) dt + ^ dV^ , 

with Vi and being independent standard Brownian motions. Notice that noise acts only on the 
fast modes directly. For the truncated system standard homogenization theory applies and yields 



dXt = {AXt - BXt"") dt + ^Jaa + o^Xt^ dWt (34) 
as the effective dynamics with true parameters 

A = u -\ , B = — , cr„ = , and ah = • 

396 352 ' 12 ' 2112 ' " 36 

See [6] for details. Figure [7] shows the performance of the estimation scheme when applied to 
observations of the three dimensional fast/slow system with v = 1, {qi,q2) = (1, 1), and e = 0.1. 
Since the true values of the effective coefficients (dashed lines) are of different orders for these 
particular choices, a semi-logarithmic scale is adopted for the sake of clarity. The plots show 
qualitatively the same behavior as in the previous examples when increasing the final time t = nh 
and suggest that the estimation procedure yields accurate estimators. Only the estimated value 
cTq fluctuates around the true value. We note that the true value aa is very small (~ 5 • lO"'^) so 
that this coefficient has only marginal influence in the complete diffusion function. Furthermore, 
recall that even the time-step in the Euler-Maruyama discretization is larger {h = 10~^), so that 
the fluctuations are expected to be due to discretization errors. As a matter of fact, considering a 
finer discretization (i.e. increasing m,n, and N) reduces the fluctuations (not shown here). 



3.2.5 Fast chaotic noise 

Our methodology is also applicable to a system of ordinary differential equations (ODEs) where the 
stochastic noise is replaced by deterministic chaos. In particular, we will consider an ODE driven 
by one of the components of an appropriately rescaled Lorentz system. More precisely, consider as 
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Figure 7: Performance of the estimators A,B,aa, and ab in ()34p as functions of the final time t = nh 
with h = 0.001. 



an example the following system 



dx 
'dt 
dm 
dt 
dm 
dt 
dm 
dt 



x-^ + -{I + ux^)y2 



10, 



yi'j 



4(2%! 
1 



y2 - ym) , 



-TT = -^{yiy-i - ^y-i) 



(35a) 
(35b) 
(35c) 
(35d) 



where the fast component y = {yi, y2, yz)^ solves the Lorenz equation. In the sequel we investigate 
two different couplings between the fast and the slow process by choosing v G {0, 1}. 

According to |47| ch. 11.7.2] (see also [191 ex. 6.2]), when eliminating the fast chaotic variable y, 
the approximate dynamics for u = Q \s given by 



dXt = A{Xt- Xt^) dt + ^dWt, 
with A = \ and the diffusion coefficient that is given by the Green-Kubo formula 



2A" 



) 1 r-l 

hm - / r{yW^\y)dsdt. 



(36) 



(37) 



Therein ip^{y) = ■ y'*(y) with (/3*(y) denoting the solution of the fast process y at time t when 
e = \ and 62 = (0,1,0)"^. The convergence of the solution of ()35ap to the solution of ()36p can 
be justified rigorously using the recent results from |42) . However, the above expression for a is 
not useful practically: Not only does not it give an analytical value for a but using it to obtain a 
is computationally expensive. Hence, it would be advantageous to use the methodology proposed 
here and estimate the effective coefficients via observations of the complete (deterministic) fast / slow 
system. To illustrate numerically that our estimation procedure can indeed deal with this problem, 
we apply it to observations of the deterministic fast/slow system using A = 2/45 and = to 
estimate both the drift and the diffusion coefficients. Since the system is deterministic, classical 
solvers for ODEs may be employed. For example, depending on the stiffness (i.e. on e) of the system, 
either a fourth order Runge-Kutta scheme or a solver based on numerical differentiation formulas 
is used. Figure |8] depicts the estimated values as functions of the final time t = nh for two different 
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Figure 8: Performance of the estimation scheme 
A = 2/45, 1/ = 0, and e G {IQ-^/s^ iQ-i 
t = nh with h = 0.001 and the true effe( 



0.14 




nh [h = 10-^) 
(b) Estimator a 

applied to the deterministic system ()35p using 
The final time of the considered time series is 
five drift parameter is yl = 1. 



choices of the scale separation e € {10 10 ^}. The value e = 10 is the same with that 
used in |19) . thus allowing for direct comparisons to be made; we will return to this point shortly. 



The estimated drift parameter A (Figure 8(a) ) shows (for both values of e) the typical behavior we 
expect in the context of multiscale diffusions. While the estimator is biased for small values of nh, 
increasing nh significantly reduces the bias so that the estimator approaches the true value (dashed 
line) . The estimators of the effective diffusion coefficient a (Figure |8(b)P also approach a limiting 
value when increasing nh, with minor fluctuations for both values of £. In both plots, one observes 
a performance difference of the estimators for different values of e. In fact, the more distinctive the 
scale separation (i.e. the smaller e) between fast and slow components, the faster the estimators 
approach a limiting value. 

For both drift and diffusion estimator the curves for different values of e give slightly different 
values 

f 0.984 , if £ = 10-1 , . f 0.121 ,ife = 10-i 

A.£ ^ < „ ,„ and < , ,„ 

0.998 , if e = lO'^/^ ] o.l24 , if e = 10"^/^ 



at time nh = 2. Hence for e > there exists an additional bias, which is the reason for the 
observed difference in the estimators for different e. On the other hand, one expects that as e 
decreases the estimators of the effective coefficients become more accurate, as one observes here 
for the estimated drift A. Thus, a is also expected to be more accurate as e decreases. Since no 
analytic formula for the diffusion coefficient exists, the estimator a is compared with alternative 
numerical approximations available in |19| ex. 6.2 and ill. 10.5]. In the numerical experiments 
performed in this study the value e = 10~^/^ was adopted giving a value of 0.126 it 0.003 using 
Gaussian (second) moment approximations based on a modified Euler-Maruyama discretization of 
the effective dynamics, and a value of 0.13 it 0.01 based on the heterogeneous multiscale method 
(HMM) with a discretization of the Green-Kubo formula for the effective diffusion coefficient; see 
also |14| for more elaborated HMM based numerical schemes applied to the Lorenz 96 model. Thus, 
we have a very good agreement of the result obtained by the estimation proposed here with these 
previously reported values. It is worth to mention that, unlike the procedure introduced here, the 
methods employed to determine the effective diffusion coefficient in |19) assume that the effective 
drift parameter A is known. While the HMM can easily be adapted to the case of an unknown drift 
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Figure 9: Performance of the estimation scheme applied to the deterministic system ()35p using 
A = 2/45, V = 1, and e = 10"'^/^. The final time of the considered time series is t = nh 
with h = 0.001. 



parameter, it is not straightforward to incorporate the unknown drift parameter in the estimation 
based on Gaussian moment approximations. In any event, incorporating the drift estimation would 
yield an even larger statistical error for these methods, while the results based on the presented 
scheme show only minor fluctuations; see Figure [S] 

Choosing = 1 in (|35ap and following the methodology outlined in |47| ch. 11], yields the following 
effective dynamics 



dXt = {AXt + BXt"^ + CXt^) dt + ^Jaa + cJbXt^ + a^Xt^ dWt . (38) 

The effective coefficients are now given by 

A = 1 + a , B = a — 1 , C = 0,^ aa = cr , = 2a , ac = cr , 

with cr being as in ([37|) . We apply our methodology again to observations of the deterministic 
fast/slow system using A = 2/45, = 1, and e = 10"'^/^ to estimate all six effective coefficients. 
Figure [9] illustrates the estimated values of both drift and diffusion parameters as functions of the 
final time t = nh. We observe the procedure's typical behavior in the context of multiscale obser- 
vations. In fact, for both drift (Figure 9(a)) and diffusion parameters (Figure [9(b) P the estimators 



approach limiting values with only minor fluctuations when increasing nh. 



4 Conclusion 

We have developed a numerical methodology for estimating multiple parameters in a coarse-grained 
equation (in one or multiple dimensions) based on observation from an associated fast/slow system 
that posses a multiscale structure. This problem is far from straightforward, not only due to the 
multiscale effects present in the available data, but also due to difficulties associated with estimating 
parameters when both the drift and the diffusion coefficients are state dependent. 



^Although we know theoretically that C = 0, we estimate C nonetheless with the proposed scheme to illustrate that 
the novel scheme can correctly identify the relevant parameters in an model that contains more parameters than 
necessary, for instance given by a Taylor or Fourier series expansion. 
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The approach developed in this study combines a number of different techniques. On the one 
hand, the derivation of the estimators rehes on simple identities based on the martingale property 
for stochastic integrals and Ito isometry. On the other, we exploit our freedom in varying the initial 
condition in combination with standard techniques from inverse problems to define the parameter 
estimators via best approximation. 

We demonstrated via a detailed numerical study that the proposed inference scheme provides us 
with accurate estimates for parameters in both the drift and the diffusion coefficients in systems 
with multiscale structure and state dependent noise. In fact, the proposed methodology appears to 
be accurate and effective even when the stochastic noise in the system is replaced by deterministic 
chaos. 

While the feasibility study of the parameter estimation for multiscale diffusions and the initial 
presentation of the estimation scheme is the main development here, clearly many open problems 
and questions remain to be addressed. One such open question is the rigorous analysis of the 
algorithm to investigate its asymptotic properties and to scrutinize its limitations. Furthermore, 
the rigorous analysis of the algorithm is expected to reveal insights into the dependency on the 
algorithm-defining parameters that in turn can be used to reduce the computational complexity of 
the methodology. 

Clearly, from a practical point of view there are different strategies to improve the efficiency of the 
estimators. A first starting-point could be the usage of techniques with an accelerated convergence 
instead of the brute-force Monte Carlo sampling to approximate the involved expectations, e.g. 
quasi Monte Carlo |43) or variance reduction techniques |33t ch. 16]. Also recent work on Multilevel 
Monte Carlo methods |18| appears very appealing in this context, although care might have to be 
taken due to the nonlinear nature of the underlying model SDE; cf. |27) . 

Several questions also arise naturally within the presented framework of varying the initial con- 
dition to set up a system of equations: How many initial conditions need to be considered? Where 
to locate the initial conditions: Equispaced or distributed differently? How does the choice of 
the initial condition influence the accuracy of the estimator? In fact, preliminary numerical ex- 
periments suggest that an alternative distribution of the initial conditions improves the accuracy. 
Hence, an "optimal" distribution of the initial condition is expected to reduce the computational 
cost considerably. 

Another interesting point concerns the best approximation. Here, we defined the estimator as 
the element that minimizes the residual of a linear system of equations with respect to the Eu- 
clidean norm. Thus, natural questions are, e.g. as to whether an alternative norm might be more 
appropriate, and how regularizing the minimization problem (e.g. by a truncated singular value 
decomposition or a Tikhonov regularization) affects the estimator. Also methodologies that "opti- 
mize" the linear system of equations to obtain more accurate best approximations, as they are for 
instance used in the reconstruction of tomographic problems |39| . are appealing. 

From a computational point of view, employing parallel computing strategies seem also promising. 
In fact, the presented algorithm consists of multiple parts (mainly when generating the observa- 
tions) without mutual dependency. Thus, these parts lead to problems that are straightforward to 
parallelize. 

A further important question is whether it is possible to modify the scheme proposed here such 
that it relies on a single long time-series rather than multiple short simulations using different initial 
conditions. This would be of interest because in some application areas it is more convenient to 
provide data of one long simulation rather than multiple short ones. We will examine these and 
related issues in future studies. 
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